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Abstract 

Ziv-Lempel and Crochemore factorization are two kinds of factorizations of words related to 
text processing. In this paper, we find these factorizations for standard epiesturmian words. 
Thus the previously known c-factorization of standard Sturmian words is provided as a special 
case. Moreover, the two factorizations are compared. 
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1 Introduction 

Some factorizations of finite words are studied by Ziv and Lemplel in a seminal paper [7] . These fac- 
torizations are related to information theory and text processing. Several years later, Crochemore 
introduced another factorization of words for the design of a linear time algorithm to detect squares 
in a word [2j [3l [4] . The Ziv-Lempel and Crochemore factorizations seem to be similar in some 
cases but significantly different in some other examples. 
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In PP, Crochemore factorizations of some of well-known infinite words, namely characteristic 
Sturmian words and (generalized) Thue-Morse words and the period doubling sequence, are ex- 
plicitly given based on their combinatorial structures. Also, they have shown that in general, the 
number of factors in the Crochemore factorization is at most twice the number of factors of the 
Ziv-Lempel factorization. 

The Crochemore factorization (or c-factorization for short) of a word w is defined as follows: 
Each factor of c(w) is either a fresh letter, or it is a maximal factor of w, which has already 
occurred in the prefix of the word. More formally, the c-factorization c(w) of a word w is 

c(w) = (Ci, • • • , C m , C m+ i, • • • ), 

where c m is the longest prefix of c m c m +i ■ • • occurring twice in Ci • • • c m , or c m is a letter a which 
has not occurred in c\ • ■ ■ c m -\. 

The Ziv-Lempel factorization (or z-factorization for short) of a word w is 

z(w) = (zi,-- - ,Z m ,Z m+ l,- ■ ■), 

where z m is the shortest prefix of z m z m+ i ■ ■ ■ which occurs only once in the word z\ ■ ■ ■ z m . In 
this paper, we give explicit formulas for z-factorization and c-factorization of standard episturmian 
words , thus we obtain the previous c-factorization of standard Sturmian words in pQ as a special 
case. Moreover, these results reveal the relation between two factorizations in the case of standard 
episturmian words. The rest of the paper is organized as follows. In Section 2 we present some 
useful definitions and notation of combinatorics on words. Section 3 is devoted to review the 
definition and some properties of episturmian words. In Section 4, we study z-factorization of 
standard episturmian words. Finally in Section 5 we present a result about the c-factorization of 
standard episturmian words. 

2 Definitions and notation 

We denote the alphabet (which is finite) by A. As usual, we denote by A* , the set of words over 
A and by e the empty word. We use the notation A + = A* \ {e}. If a G A and w = W\w% . . . w n is 
a word over A with the Wi G A, then the symbols \w\ and \w\ a denote respectively the length n of 
w, and the number of occurrences of letter a in w. For an infinite word w we denote by Alph(~w) 
(resp. Ult(w)) the number of letters which appear (resp. appear infinitely many times) in w (The 
first notation is also used for finite words). A word v is a factor of a word w, written v -< w, if 
there exists u,u' G A*, such that w — uvvf. A word v is said to be a prefix (resp. suffix) of a 
word w, written v < w (resp. v D> w), if there exists u G A* such that w — vu (resp. w = uv). If 
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to = vu (resp. w = uv,) we simply write v — wu^ 1 (resp. v = u~ l w). The notations of prefix and 
factor extend naturally to infinite words. Two words u and v are conjugate if there exist words p 
and q such that u = pq and v — qp. For a word w, the set F(w) (resp. F n (w)) is the set of its 
factors (resp. the set of its factors of length n); these notations are also used for infinite words. If 
w is an infinite word, then the related complexity function, is p w (n) = \F n (w)\. The reversal of 
w = W1W2 ■ ■ ■ w n is W — w n w n -i . . . W\. The word w is a palindrome if w — w. A word w £ A + is 
called primitive if m £ N+ and w = u m implies m = 1. 



3 Episturmian words 

An infinite word s is episturmian if -F'(s) is closed under reversal and for any £ £ N there exists at 
most one right special word in i*i(s). Then Sturmian words are just nonperiodic episturmian words 
on a binary alphabet. An episturmian word is standard if all its left special factors are prefixes of it. 
It is well-known that if an episturmian word t is not periodic and Ult(t) = fc, then its complexity 
function is ultimately pt{n) = (k — l)n + q for some q £ N+. Let t be an episturmian word. If t 
is nonperiodic then there exists a unique standard episturmian word s satisfying Ft = F s ; If t is 
periodic then we may find several standard episturmian words s satisfying Ft = F s . In any case, 
there exists at least one standard episturmian word s with Ft — F s . If the sequence of palindromic 
prefixes of a standard episturmian word s is u\ = e, 112, 1*3, • • • , then there exists an infinite word 
A(s) = X1X2 • • • , Xi £ A called its directive word such that for all n £ N+, 

U n+ l = (u„x„) (+) 

where is defined as the shortest palindrome having aiasa prefix. (Similar construction for 
Sturmian words can be found in [5].) The relation between u n and u„+i can also be explained 
using morphisms: For a £ A define the morphism ij) a by ip a (a) = a, and tp a (x) = ax for x £ A\{a}. 
Let /xo = Id and /i n = Tp Xl Tpx 2 • • • ipx n f° r 71 e Moreover, let h n = jti n (x n +i). Then 

u n+ i = h n -iu n , n £ N + 

From the above equation, it is concluded that 

u n +i = h n -i ■ ■ ■ hxho = hoh%--- hn-i (1) 

It is known that for any integer n, h n is primitive (See Proposition 2.8 of [B]) and so is h n . For 
any integer n define P(n) as the maximum value of i satisfying i < n and Xi — x n ; if there is no 
such i then P(n) is undefined. We have the following Lemma. 

Lemma 1. 
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(i) 

{u n x n if P( n ) * s undefined, 
UnU p(n) otherwise. 

(ii) If P(n) is defined then 

h n -i = h n -2h„-3 ■ ■ ■ ^P(n)-1- 

Proof. (i) See the end of Section 2.1 of [6]. 
(ii) This is proved by using part (i) and JT]). 

□ 

It is obvious that < h n . In addition, by Proposition 2.11 of [B] we have 
Lemma 2. 

(i) h n = /i„_i if and only if x n+x = x n . 

(ii) If x n+ i ^ x n then u n is a proper prefix of h n . 

Lemma 3. Let A(s) = x\ . . . x n , . . . , Xi G A. Suppose that x n = a and the letter a has at least 
one appearance before x n in A(s). Then 

(i) h n -i < u n and ft.„_i > u n . 

(ii) The word v n ^i = u„(/i n _i) _1 is palindrome. 
(hi) v n -i > u„_i and w„^i < u„_i. 

(iv) u n > ■u n _ift n _ 1 . 

(v) // moreover x n ^ x n -i, then u n > (/i n -i) 2 owd Un+i I> (^-n-i) 3 - 

Proof. (i) By Lemma[lji), fo n _i = 1%^, i. So /i n _i < u„, which concludes t> = u n . 

(ii) By part (i), there exists a word satisfying u n = v n -xh n -i. Hence by u n +\ = h n -\u n , 
we obtain u n+ \ = h n -iV n -ih n —i. But since u n +i is palindromic, from the last equation we 
conclude that so is v n -\. 

(hi) From u n = v n —ih n -i = u„_i/i„_2 and |/i n _i| > /i n -2| we conclude that u n _i which 
yields v n -i = T^TY > = m„_i. 

(iv) This is concluded from u n — v n -\h n -\ using part (hi). 
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(v) Using part (iii) and Lemma HJii), we get u n > (ft,„_i) 2 ; combining this with = u n h n -x, 
we provide D> (h n -if . 

□ 

The following representation of directive word is useful for next sections. Let 

A(s) = x\X2 ■■■ = y^yt 

where Xi,yi £ A, yi i= j/j+i and di > for i > 0. Define the function g : N — ► N by 

g(m) = di H h d ro _i + 1 

Lemma 4. Witt the above definitions, the following statements hold. 

(i) Ug(m+1) = (hg(m)-l) "* u g(m) = u e?(m) (fyj(m)-l) <im ■ 

(ii) u 9(m+ i) = (/» ff(TO) _ 1 )*»(^ (m _i ) _i) ,, -»- 1 • • • (h ) dl = {h~v) dl {h~i) d2 ■ ■ ■ (h~^~i) dm - 

(iii) u ff ( m )_i is a proper prefix of h g(m) _ 1 . 

(iv) u 9 ( m ) > (/i 3 ( m )-i) 2 and u fl ( m+ i) > (ft- g ( m )-i) dm+2 - 

Proof. (i) For any integer n with <?(m) < n < g{m + 1) we have x n = y rn and by Lemma[2li), 
h n -i = h g (m)-i- Thus for any integer j with < j < d m we have 

= (' l g(m)-l) : 'M g ( m ) = U 9 ( m )(/l ff ( m )_l) J . 

Particularly for j = d m the result is provided. 

(ii) This is concluded from (TTJ). 

(iii) This is obtained from Lemma [2jii). 

(iv) By LemmaEIv) we obtain u g(m) > (ft. g ( m )_i) 2 ; Using this and u g(m)+1 = u g ( m) (h g{m) _i) dm , 
we provide u fl ( m +i) > (/i ff ( m )_i) dm+2 . 

□ 

4 z-factorization 

Theorem 5. Let s &e an episturmian word with directive word A(s) = X1X2X3 . . . — y dl y d2 ■ ■ ■> 
where y{ ^ yi+i, for all i > 1. TTie z-factorization of s is 0/ t/ie /orm z(s) = (zi, Z2> • • -)j w^ere 
zi = sci and z fe = y : k l 1 (hg(k-i)-i) dk ' 1 yk for k>2. 
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Proof. We prove the result by induction on k. It is easily seen that z\ = x\ = y\. Now suppose 
that the result is true for any j < k. Thus we have 

Z1Z2Z3 ■ ■ ■ z k -i =yi yi l {h g (i)-i) dl y2 % 1 (h g ( 2 )-i) d2 y3 ••• y k l 2 ( h g(k-2)-i) dk - 2 yk-i 

= (h^~i) dl (h^)~i) d2 ■ ■■h gik - 2 )-i) dk - 2 yk-i 

We should conclude z k = y k \(h g ^ k ^i)-i) dk ^ 1 y k . For this purpose, the two following facts should 
be proved. 

Fact 1. yk\{h g {k-i)-i) dk - 1 ^Ug^x^ 1 . 

Fact 2. yklAhgik-^-i^-'Vk ^u g{k) . 

We prove these facts in two cases. 



Case (i). Suppose that y k -i = a has already appeared in A(s). By Lemma [3] (i), h g (k-i)-i > 
u g(k-i) hence 

Vk-l( h 9{k-l)-l) dk - 1 >W S (fc-l)( /l 9(fc-l)-l) dfc_1 ~ 1 

But the right side, is a prefix of u g ^ — Ug^-i) (^g(fc-i)-i) dfc ^ 1 • This proves Fact 1. 
To prove Fact(2), by contrary, suppose that 

yk-i( h 9(k-D-i) dk - 1 yk < u g(k) . (2) 

By LemmaH^iv), %(fc_i) > (/i 3 (fc-i)-i) 2 so 

Ug(k) = Ug {k ^ 1) (hg (k _ 1) _ 1 ) dk - 1 > {h g(k - 1 )- 1 ) dk - 1+2 . (3) 
From ([5]) and ^ we conclude 

J/fc-iCV-D-i)^" 1 ^ ^ (^(fc-i)-i) dfc - 1+2 , 

which implies that y k l l {h g { k ^i)-i) dk ~ 1 y k — w dk ^ 1 for some id ~ h g (k-i)-i; but this is possible 
only if yk—i = y k which is a contradiction. Hence, Fact 2 is proved in this case. 

Case (ii). Suppose that y k -\ — a has not appeared before in A(s), hence, 

= t/ fe _in 9 ( fe _x) (4) 
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Thus Fact 1 is required as follows 

yk-l( h g(k-i)-i) dk ~ 1 = u g(k-i)(yk-iu g[k - 1) ) dk - 1 ~ 1 < u g{k ^ 1) {h g(k _ 1) _ 1 ) dk ~ 1 x^ 1 = Ug^Xi 1 . 
In order to prove Fact 2, suppose by contrary that 

Vk-i^gik-^-if^Vk < u g{k) (5) 
On the other hand, by (UJ and u g (fe) = ii s (fc_i)(/i g (fc_i)_i) we obtain 

u a(k) = u g{k -i ) {y k -iu g{k - l) ) dk - 1 -< (y k -iu g{k _ 1} ) dk - 1+1 = (h g(k _ 1) _ 1 ) dk - 1+1 (6) 
From ([5]) and ^ we provide 

Vk-i{h{k-i)-i) dh - x yk ■< (^ (fe -i)-i) rffc - 1+1 

which implies that y^ 1 (/i ff (/ c -i)-i) rffc ~ 1 2/fc = w dk - 1 for some w ~ /i 9 (fc-i)-i, but this is possible 
only if 2/fc-i = y k , which is a contradiction. This ends the proof. □ 



5 c-factorization 

Theorem 6. Let s be an episturmian word with directive word A(s) = X1X2X3 . . . — yi dl y2 d2 y3 d3 ■ ■ ■ 
, where Xi,yi S A. and yt ^ for all i > 1. Ifc(s) = (ci, C2, . . .), i/ien i/iere exisi integers i and j 
such that c\ ■ ■ ■ c k — w g (fc-j+i+i) for any k > j. Consequently, we obtain c k — (h g ( k _j +i )_i) dk - j+i , 
for all k > j. 

Proof. Let i = min{£ : {yi, ?/2, ■ ■ • , JJt} — {1,2,..., A;}} and j/j = a. Since j/j = a has no 
occurrence in u g (j\, we have u g ^ +1 = u g (j\au g (j\, hence there exists an integer j > 3 satisfying 
C\ • ■ ■ Cj-2 — u g u) and c,_i = yi. Moreover, by Lemma[IJi), we have 

h g (i)-i = yiU g (o (7) 

Ug(i+l) = Ug(i){yiU g (i)) di (8) 

Now we are going to prove that Cj = (h g (i\-i) di . Denote the right side by w and note that 

ci ■••Cj-_iU) = u g (i ) y i yl x (h g ^_i) di = u g(i+1) 

It is clear that w — u g ^(yiU g ^) di ~ 1 has at least two occurrences in c\ ■ ■ ■ Cj-\w = u g ^{yiU g (i)) di . 
Thus it is enough to prove that wyi+\ -fc. u g ^ i+ iy By contrary, suppose that ivyi+x -< u g u+i\ so 

u g ( l )(yiu g ( l) ) d *~ 1 y l+1 -< Ug(i ) {y i u g ( {) ) di (9) 

Since yi ^ Alph(u g ^), © can happen only if yi+\ — yi which is a contradiction. Thus Cj = w as 
required. 
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Now we claim that the following equation 

cic 2 • ■ • c fc = u g[k _ j+l+1) (10) 

holds for any integer k > j. The statement is true for £ — j 'by the above arguments. We proceed 

by induction on k. Suppose that k > j and that C1C2 • • - Q = u g{i-j+i+i) holds for any integer £ 

with j < I < k. By LemmaS](i), it is enough to show that c k = (h g ( k _j +i ^_ l ) dk -^+ i . For this, the 

two following facts should be proved 

Fact 1. (h g{k _ j+l) _ 1 ) dk -i+ i -< Ug^j+^x^ 1 

Fact 2. (h g ( k - j+i )- 1 ) dk -j+ i y k - : j +i+1 ^ u g{k _ J+i+1) 



By Lemma[3](i), h g(k _ j+i) _ 1 \> u s ( fc _ i+i ), we provide 

(hg(k-j+i)-i) dk - j+i > u 9 (/c-j+ l )(/j g (/c- 3 + l )-i) dfc " ,+I " 1 , 
which together with u g ( k _ j+i ) (ft ff ( fc _j +i )_ 1 ) d *-*+ i-1 -< Ug^j+j+i)^ 1 proves Fact 1. 



To prove Fact 2, suppose by contrary that (fr 9 (fc-j+ 4 )-i) d ' c ~ i+i 2/fc-.j+i+i -< u g ( k _ j+i+1 y By 
using Lemma U (iv), this concludes that 

(hg(k-j+i)-l) dk - j + i yk-j + i+l -< (hg(k- 3+t )-l) dk - 3 + ' +2 

Since h t is primitive, it has just dfc-j+i + 2 occurrences in the right side; thus the last relation 
implies that yt— equals the first letter of h g ^ k ^j + ^_ 1 , i.e. y k -j+i+\ = yu-j+i which is a 
contradiction. □ 

Remark 1. By slight modification of the argument used in the proof of Theorem [BJ we find 
that c-factorization of a standard episturmian word is as follows: c\ — y\ and 

\ V2 if dt = 1, 
I yf 1 ^ 1 otherwise. 

For any integer m > 2, there exists an integer n such that c±C2 ■ ■ ■ c m = U g ( n )Ot n , where either 
a n =£or a n = y n . In addition, the next factor, c m +i, is given by 

iVn if On = e and y n {yi, • • • , y n -i}, 

(/i ff ( n) _i) d ™ if a n = e and y n 6 • • • , y n -i}, 
Vn 1 { h g{n)-x) dn otherwise, i.e. if a n = y n . 

It is concluded that if a n — e and y n £ {yi, • • • , y„_i}, then c\ ■ ■ ■ c m +\ — u g ( n )y n ; otherwise 
ci • • • c m _|_i = u g ( n _|_i). Moreover, setting fco = Alph(s), it is provided that the values i and j in 
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Theorem El satisfy the following equation. 




ko — 1 if g?i = 1, 
ko otherwise. 



Remark 2. Considering Theorem[S], Theorem [5] and Remark 1, we conclude that from a point 
on, the formula Zk = y^_ x Ck+k -i-mVk holds, where fco = \Alph(s)\ and 

[ 1 if di = 1, 
m = < 

I otherwise. 

Remark 3. From Theorem [5J by using Remark 1, we obtain that from a point on, Ck = 
(^g(fc— fc +m)-i) 0+m ) where fco and m are defined as above. Now if s is standard Sturmian, by 
using definition and notation of Chapter 2 of [8] about standard words and Sturmian words, it is 
easily proved that h g r p -\-x = s p -i, for any integer p > 1. So in this case, by replacing fco = 2, 
we conclude that from a point on, Ck = {sk+ m -3) dk+m ~ 2 ■ Thus in case d\ > 1 (resp. d\ — 1) by 
calculating the first four factors (resp. first three factors), we conclude Theorem 1 of |T] about 
c-factorization of standard Sturmian words. 
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